Minimal Cost Complexity Pruning of Meta-Classifiers
نویسندگان
چکیده
Integrating multiple learned classification models (classifiers) computed over large and (physically) distributed data sets has been demonstrated as an effective approach to scaling inductive learning techniques, while also boosting the accuracy of individual classifiers. These gains, however, come at the expense of an increased demand for run-time system resources. The final ensemble meta-classifier may consist of a large collection of base classifiers that require increased memory resources while also slowing down classification throughput. To classify unlabeled instances, predictions need to be generated from all base-classifiers before the meta-classifier can produce its final classification. The throughput (prediction rate) of a metaclassifier is of significant importance in real-time systems, such as in e-commerce or intrusion detection. This extended abstract describes a pruning algorithm that is independent of the combining scheme and is used for discarding redundant classifiers without degrading the overall predictive performance of the pruned metaclassifier. To determine the most effective base classifiers, the algorithm takes advantage of the minimal costcomplexity pruning method of the CART learning algorithm (Breiman et al. 1984) which guarantees to find the best (with respect to misclassification cost) pruned tree of a specific size (number of terminal nodes) of an initial unpruned decision tree. An alternative pruning method using Rissanen’s minimum description length is described in (Quinlan & Rivest 1989). Minimal cost complexity pruning associates a complexity parameter with the number of terminal nodes of a decision tree. It prunes decision trees by minimizing the linear combination of the complexity (size) of the tree and its misclassification cost estimate (error rate). The degree of pruning is controlled by adjusting the weight of the complexity parameter, i.e. an increase of this weight parameter results in heavier pruning. Pruning an arbitrary meta-classifier consists of three stages. First we construct a decision tree model (e.g. CART) of the original meta-classifier, by learning its input/output behavior. This new model (a decision
منابع مشابه
Cost Complexity Pruning of Ensemble Classifiers
In this paper we study methods that combine multiple classification models learned over separate data sets in a distributed database setting. Numerous studies posit that such approaches provide the means to efficiently scale learning to large datasets, while also boosting the accuracy of individual classifiers. These gains, however, come at the expense of an increased demand for run-time system...
متن کاملPruning Meta-Classifiers in a Distributed Data Mining System CUCS-032-97
JAM is a powerful and portable agent-based distributed data mining system that employs metalearning techniques to integrate a number of independent classifiers (models) derived in parallel from independent and (possibly) inherently distributed databases. Although meta-learning promotes scalability and accuracy in a simple and straightforward manner, brute force meta-learning techniques can resu...
متن کاملPruning Meta-Classifiers in a Distributed Data Mining System
JAM is a powerful and portable agent-based distributed data mining system that employs metalearning techniques to integrate a number of independent classifiers (models) derived in parallel from independent and (possibly) inherently distributed databases. Although meta-learning promotes scalability and accuracy in a simple and straightforward manner, brute force metalearning techniques can resul...
متن کاملPruning Classifiers in a Distributed Meta-Learning System
JAM is a powerful and portable agent-based distributed data mining system that employs meta-learning techniques to integrate a number of independent classifiers (concepts) derived in parallel from independent and (possibly) inherently distributed databases. Although metalearning promotes scalability and accuracy in a simple and straightforward manner, brute force meta-learning techniques can re...
متن کاملAn Empirical Comparison of Pruning Methods for Ensemble Classifiers
Many researchers have shown that ensemble methods such as Boosting and Bagging improve the accuracy of classification. Boosting and Bagging perform well with unstable learning algorithms such as neural networks or decision trees. Pruning decision tree classifiers is intended to make trees simpler and more comprehensible and avoid over-fitting. However it is known that pruning individual classif...
متن کامل